The Education Data Portal bridges the gap between data availability and data accessibility.
What do I mean by the availability-accessibility gap?
How does the portal bridge this gap?
Why does bridging this gap matter?
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
With dark text on a dark background
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
With dark text on a dark background
And a little blurry
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
With dark text on a dark background
And a little blurry
And inconsistent rows and columns
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
With dark text on a dark background
And a little blurry
And inconsistent rows and columns
And the occasional coffee spill
What do I mean by the data availability-accessibility gap?
Example: Collecting data on COVID in jails and prisons
A spreadsheet
Scanned as a PDF
With dark text on a dark background
And a little blurry
And inconsistent rows and columns
And the occasional coffee spill
Accessible to whom?
How does the portal bridge this gap?
Provides a one-stop-shop for 100+ datasets released by government agencies and other institutions on schools, school districts, and colleges in the U.S.
Includes harmonized data and metadata for each dataset
Makes it easier for users to look at trends over time and combine data from different sources
How does the portal bridge this gap?
Example: How has tuition at my alma mater changed?
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Give up Take an ice cream break
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Give up Take an ice cream break
Update the code per the documentation
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Give up Take an ice cream break
Update the code per the documentation
Remember to repeat the process again next year
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Give up Take an ice cream break
Update the code per the documentation
Remember to repeat the process again next year
(And hope nothing changes)
Without the Education Data Portal…
Example: How has tuition at my alma mater changed?
Find the agency collecting the data
Read the data documentation
Download data files for each year
Load each file into R or Python
Notice a few anomalies
Re-read the data documentation
Give up Take an ice cream break
Update the code per the documentation
Remember to repeat the process again next year
(And hope nothing changes)
This is tedious, error-prone, and simply not fun.
Using the portal R package…
Example: How has tuition at my alma mater changed?
library(educationdata)# Get data data <-get_education_data(level ="college-university",source ="ipeds",topic ="academic-year-tuition",filters =list(year =c(1990:2020), unitid ="173258", tuition_type ="4" ))# Plot data data %>%ggplot(aes(x = year, y = tuition_fees_ft)) +geom_line()
Using the portal Python package…
Example: How has tuition at my alma mater changed?
import educationdata # Get data data = get_education_data( level ="college-university", source ="ipeds", topic ="academic-year-tuition", filters = {"year": range(1990, 2020), "unitid": "173258", "tuition_type": "4" })# Plot data data.plot.line( x ="year", y ="tuition_fees_ft")
Using the portal Stata package…
Example: How has tuition at my alma mater changed?
* Get data educationdata using /// "college ipeds academic-year-tuition", sub( /// year=1990/2020 /// unitid=173258 /// tuition_type=4 ///)* Plot data twoway (line tuition_fees_ft year)
Using the portal Data Explorer…
Example: How has tuition at my alma mater changed?
TODO: Add video
Why do I think the portal bridges this gap so effectively?
By focusing on the underlying API
By focusing on data documentation
The underlying API
120+ data endpoints
(with the data)
12+ metadata endpoints (about the data)
All other tools, packages, and documentation are built on these endpoints
Data documentation
Considered a first-order priority
For humans and machines
With details on demand
Data documentation
Considered a first-order priority
For humans and machines
With details on demand
Data documentation
Considered a first-order priority
For humans and machines
With details on demand
Why do I think the portal bridges this gap so effectively?
By focusing on the underlying API and data documentation
Why does bridging this gap matter?
Different people ask different—and important—questions.